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Abstract We prove that under some regularity and strong iden- 
tifiability conditions, around a mixing distribution with mo compo¬ 
nents, the optimal local minimax rate of estimation of a mixture with 
m components is This corrects a previous paper by 

Chen (1995) in The Annals of Statistics. 


1. Introduction. Let be {/(x, 0)}gg0 be a family of probability densi¬ 
ties with respect to some cr-finite measure A. The parameter set 0 is always 
assumed to be a compact subset of M with non-empty interior. A finite mix¬ 
ture model with m components is given by 

(1) f{x,G)= [ f{x,9)dG{9) 

Je 

where G is a m-points support distribution on 0, called the mixing distribu¬ 
tion. The class of such m-mixing distributions G is denoted by and 
will be the union of Qj for j G |1, mj. 

In Section 2 we will show that a consistent estimator Gn € G<^m of an 
unknown mixing distribution Gi can not converge uniformly faster than 
^-i/(4(m-mo)+2) neighborhood of Go G Gmo: in th® (L^-)Wasserstein 

metric, where n is the sample size. Recall that this metric can be defined by 

(2) w{Gi,G 2 )= [ \Gi{-oo,t]-G 2 i-oo,t]\dt, 

Jr 

and that by the Kantorovich-Rubinstein dual representation, 

(3) W{Gi,G 2 )= sup [ f{9)d{Gi-G2)i9). 

l/lLipsgi-ie 

Primary 62G05; secondary 62G20. 

Keywords and phrases: Local asymptotic normality, convergence of experiments, maxi¬ 
mum likelihood estimate, Wasserstein metric, mixing distribution, mixture model, rate of 
convergence, strong identifiability. 
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In Section 3, we prove that the rate 7i-i/(4(m-mo)+2) jg optimal, under strong 
identifiability conditions. Finally, Section 4 exhibits natural families satisfy¬ 
ing these strong identifiability conditions. 

Some auxiliary or too long computations are postponed to Appendix A. 

2. The optimal rate can not be better than j7,-i/(4(»Ti-mo)-|-2) 
The main idea is to build families of mixing distributions Gn{u) with the 
same 2(m— ttiq) first moments, and as rescaled shifted (2(m—mo) + l)- 

th moment. Hence the Wasserstein distance between Gn{ui) and Gn{u 2 ) will 
be of order 77,-i/(4(m-mo)+2) ^eed n observations to be told apart. 

Theorem 2.4 makes this precise. We first need a few tools. 

We give a far-from-general definition of local asymptotic normality (Le Cam, 
1986), but it is sufficient for our purposes. 

Definition 2.1. Given densities fn,u with respect to a measure X, con¬ 
sider the sequence of experiments Sn = {fn,u)U G lAn} with each point o/M in 
tin for n large enough. Let X have density fnfi and consider the log-likelihood 
ratios: 

Suppose that there is a positive constant T and a sequence of random variables 
Zn with Zn A^(0, r), such that for all u G M; 

2 

(4) Znfl{u) - uZn + > 0 

Z n^oo 

The sequence of experiments is said locally asymptotically normal (LAN) 
and converging to the Gaussian shift experiment {AA(nr,r),u G M}. 

d P 

Of course, here —>■ (resp. —)•) stands for convergence in distribution (resp. 
in probability). Intuitively, (almost) anything that can be done in a Gaus¬ 
sian shift experiment can be done asymptotically in a locally asymptotically 
normal sequence of experiments. 

Definition 2.2. Let {fix,9)}g^Q be a family of densities with respect to 
a a-finite measure A. Let us consider, for p G N and q > 0, the functions: 

Ep,q 1 0^ [0, oo] 

&\x,e2) 

fix,03) 


( 5 ) 


{61,62,63) 


q 
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We say that the family of densities is {p,q)-smooth if Ep^q is well-defined and 
continuous on 0^, and if there exists e > 0 such that for all Q\, 


( 6 ) 


102-031 <e ^ ^P,,( 01 , 02 , 03 ) <oo. 


Example 2.1. Let us consider an exponential family with natural pa¬ 
rameter 6 G 00, so that f{x,6) = h{x)g{6) exp{6T{x)), with g G C°°. Con¬ 
sider 0 such that its e-neighbourhood 0 0 B{f),e) is included in 0o. Then 
{/(x,0),0 G 0} is {p,q)-smooth for any p and q. Indeed, 

/i(x)e®2TW 


&\x,e2) 

f^P\x,92) 

f{x,03) 

&Hx,92) 


.( 02 - 03 )r(x) 


/(t,03 


5(03) 

:.q{92-d3)T{x) 


E 

Lfc=o 

p 

E 

Lfe=o 




g'~'^'{92)TP-Hx] 


g^’^\02)TP-\x) 


= a- 


E U)0^'H02)TP-'(x) 


A:=0 


SO that 


Ep,q{9l,92,93) = 


I /V- \J I 

9{«i)E9, ELo 


5 ''( 03 ) 5(01 +<?(02 - 03 )) 

Since all the moments of the sufficient statistic T{x) are finite under a dis¬ 
tribution in the exponential family, and since 9\ 0 g02 — (^03 is in 0o for 
(02 — 03 ) < ejq, we have finiteness o/£'p^q(0i, 02 , 03 )- Continuity is clear. 

Being {p, g)-smooth ensures finiteness of similar integrals when some 9j 
are replaced with mixing distributions with components close to the 9j'. 

Proposition 2.3. Given ttq > 0 and two positive integers mo ^ m, 
define mixing distributions 


g« = E 

i=i 




n 


such that dj^n —t 00 for all j G |mo,m] and YlY=mo'^jw — '^0 for all 
large enough. Consider a {p,q)-smooth family of densities {fix,0)}gi=Q with 
respect to some a-finite measure A. 

Then there is a finite C depending only on 9 q and tto such that for any 9 
satisfying \9 — 0o| ^ e/2, for n large enough, for any mixture f{x,G): 

!<? 


Eg 


&\x,9) 


f{x,Gr. 





4 


P. HEINRICH AND J. KAHN 


If, in addition, the function |/^^^(x,0o)| has nonzero integral under X, then 
there is a positive c depending only on Oq such that for any mixture f{x,G): 


Eg 


&\x,9o) 

fix,G) 


Proof. For n large enough, we have \6j^n — 9o\ ^ £/2 for all j G |mo, mj. 
Hence \0j^n — 0| ^ e for all 6 such that \6 — 9q\ ^ e/2. So that we may use 
(6). By compactness and continuity, there is a finite G such that 




&\x,9) 
fix, 9j^n) 


for all such (j, n) and all 9i. Since f{x,G) is a convex combination of 
some f{x,9i), we may replace 9i by G in the former expression. Since 
the function l/y‘^ is convex on positive reals, by Jensen inequality, setting 

^ = Ylij=mo 


m 



j=mo 


&\x,9) 

fix,9j,n) 


&\x,9) 


ET= 


j=mo 


fix, 9jj. 


g 

> H'' 

&\x,9) 


fix,Gr,) 


and taking expectations with respect to G we obtain the upper bound 


Eg 


f^\x,9) \ G ^ G 
fix,Gn) ^ A<1 ^ TT^' 


The lower bound does not depend on (p, g)-smoothness. It is a simple 
consequence of rewriting: 


Eg 


f^PHx,9o) 

Gf 

&\x,9o)<i 

fix,G) 

J 

fix,G)i-^ 


dA(x) 


and noticing f \f(x,G)\ dA(x) = 1 since f{x,G) is a probability density. By 
assumption, there is a set B of measure A(i?) = M > 0 on which the function 
f^^\x, 9 q) is more than some e > 0. Now, the set B n {fix, G) ^ 2/M} is of 
measure at least M/2 and thus 


f &\x,9oY 

J fix,G)i-^ 


dA(x) ^ 


M 

T 


-i-j+i 


□ 
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Theorem 2.4. Let mo < m. Let Gq = ^ “ mixing 

distribution whose mo-th component is in the interior ofQ, that is 6mo £ ©• 

Then there are mixing distributions Gn{u) (n ^ 0, u G all in Qm such 
that: 

(i) \N{Gn{u),Go) —)■ 0 for all u G W. More precisely, for some C{u) > 0, 
we have 

\N{Gn{u),Go) ^ 

(a) The mixing distributions get closer at rate all ui 

and U 2 , there are constants c{ui,U 2 ) > 0 such that 

\N{Gn{ui),Gn{u2)) > c(ni, . 

(Hi) Suppose that a family of densities {f{x,9),6 G 0} with respect to A is 
{p,q)-smooth for all p G |l,2(m — mo + 1)] and q G |1,4]. Assume 
moreover that 

j |/(2(™—o)+i)(^^0^j|dA(x) >0. 

There is a number F > 0 and an infinite subset Nq o/N along which the 
experiments £n = {OLi / Gn{u )), \u\ ^ Umaxin)} with Umaxin) 
oo converge to the Gaussian shift experiment {AA(ur,r),it G M}. 

(iv) u is the rescaled (2(m — mo) + l)-th moment of the components of the 
mixing distribution near 6mo ■ 

The theorem shows that when the first moments of the components of the 
mixing distribution G near Omo are known, all remaining knowledge we may 
acquire is on the next moment, and that’s the “right” parameter: it is exactly 
as hard to make a difference between, say, 10 and 11 as between 0 and 1. 

On the other hand, for our original problem the cost function is the trans¬ 
portation distance between mixing distributions. So that an optimal esti¬ 
mator in mean square error for u is not optimal for our original problem. 
Moreover just taking the loss function c{ui,U 2 ) in the limit experiment runs 
into technical problems since this might go to zero as U 2 goes to infinity. 
They could be overcome, but it is easier to state a lower bound on risk using 
just contiguity and two points: 

Corollary 2.5. The optimal local minimax rate of estimation around 
Go of a mixture cannot be better than j7,-i/(4(m-mo)+2) general: for any 
sequence of estimators Gn and any e > 0, we have: 

(7) liminf sup Q )(gm\N(Gi,Gn) > 0, 

Gis.t. 

lU(Gi,Go)<n-i/U(m-™o)+2)+e 
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where the true distribution Gi lies in Qm ■ 


Proof of corollary 2.5. Fixti > 0 and consider the densities= 
YYi=i f with associated probability measures as in Theo¬ 

rem 2.4 (iii). We have 

1 u‘^ 

(8) liminf inf P„u(A)^-e . 

n->oo A:P„,o(A)^3/4 ’ 4 


Indeed, the LAN property (4) can be written as 

„ fn,u{^) „-uZ„+^r P, 1 

Pn ■- 7 

Jn,0[P^) 

with X of density fnfi and with asymptotic distribution AA(0, T). For any 
event A, 


Pn,M(^) — Pn,0 


( fn,u{X) 

V/n,0W 



= E, 


n,0 



p 

Furthermore, by restriction on the event {Zn > 0} and by using —> 1, we 

get that Pn,u(^) is bounded below by 

[Pn,o(^) - Pn,o(^n ^ 0)] -b o(n). 


Taking now the infimum on events A such that P„^o(^) ^3/4 and passing 
to the limit as n ^ oo, we obtain (8). 

We now consider, for any sequence of estimators (5^, the event 

A = > a} 


for some a > 0 to choose. Notice that by the triangle’s inequality its com¬ 
plement A'^ satisfies 

where c(tt,0) > 0 is given by Theorem 2.4 (ii). Choose a = c(tt,0)/2.Then 
either P„^o(^) >1/4, which gives 

sup ^ 

Gig{G„(0)} 4 

2 

or ¥n,uiA^) > e ^'"/4 in the limit, by (8), so that 

liminf sup n^/(^(™-”^°)+2)Eq.GD®nW(Gi,Gn) ^-e-"^^. 

GiG{G„(n)} ' 4 
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Thus, gathering the two inequalities, we get 

liminf sup ^ 

GiG{G„(0),G„H} 



Note to finish that by Theorem 2.4 (i), each G,i(0) or Gn{u) is at W-distance 
at most 7j-i/(4(m-mo)+2)+£ fQj. [aj-gg enough. □ 


Remarks 2.1. ITe want only an example of this slow convergence, and 
that it be somewhat typical. That’s why we have chosen the regularity condi¬ 
tions to make the proof easy, while still being easy to check, in particular for 
exponential families. 

In particular, it could probably be possible to lower q in {p, q)-smoothness 
to 2 -\- e and still get the uniform bound we use in the law of large numbers 
below. Similarly, less derivability might be necessary if we tried to imitate 
differentiability in quadratic mean. 

In the opposite direction the variance T in the limit experiment is really 


expected to be Ecg 


fix,Go) 

ularity conditions may be needed to prove it. 


in most cases, but more stringent reg- 


Proof of Theorem 2.4. In this proof and the rest of the paper, we 
need to compare asymptotic sequences. The notation On ^ bn (or even a ^ b 
if n is kept implicit) means that there is a positive constant C such that 
On ^ Cbn ; in other words, an = 0{bn). We will also use a„ bn for 
On ^ Cbn, and x bn for bn ^ On =4 bn- Finally o„ bn means that the 
constant may depend on u, that is ^ C{u)bn- 

We use the following theorem by Lindsay (1989, Theorem 2A) on the 
matrix of moments ; the idea is close to the Hankel criterion developed by 
Dacunha-Castelle and Gassiat (1997) to estimate the order of a mixture. 

Theorem 2.6. Given numbers l,mi,... ,m 2 d, write for the A: + 1 
by k -\- 1 (Hankel) matrix with entries = mi^j -2 for k = 1,... ,d. 

(a) The numbers l,mi,... ,m 2 d o.tg the moments of a distribution with ex¬ 
actly p points of support if and only if det > 0 for k = 1,... ,d — 1 
and det Mp = 0. 

(b) If the numbers 1, mi,..., m 2 d -2 satisfies det > 0 for k = 1,..., d — 1 
and m 2 d-i is any scalar, then there exists a unique distribution with 
exactly d points of support and those initial 2d — 1 moments. 

Set d = m — mo + 1 and consider any numbers 1, mi,..., rn 2 d -2 such that 
det Ml > 0,..., det Md-i > 0. By Theorem 2.6, we may then dehne for any 
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G M a distribution G{u) = YlJLmo '^ji'^)^hj(u) such that its initial moments 
are l,mi,... , 777 - 2 ^- 2 , 

Moreover, the unicity in Theorem 2.6 implies that, with > 0 and hi < 
■ ■ ■ < hd, the following application is injective: 

( d d d d 

111 1 

Now, its Jacobian is non-zero (see Appendix A.l for a proof): 

(9) J{(j)) = TTl-'-lTd {hj-hk) . 

l^j <k^d 

Thus the inverse of (p is locally continuous, so that the hj{u) are all con¬ 
tinuous. In particular, they are bounded if u is bounded: for any U > 0, 
there is a finite H{U) such that if |u| < U, then \hj{u)\ ^ H{U). We 
may then find and use a sequence Uraa.x{n) such that u^a.x{n) —)• 00 and 
^ 0 . 

We now define the mixing distributions 

mo —1 m 

(10) Gn{u) = ^ TTjJe. +TTmQ M 

j=l j=mo 

with 

0j,n{u) = Omo + 

This definition satisfies (iv). The form of Gn{u) makes it clear that it con¬ 
verges to Go at speed jg easily seen from the dual representation 

of W that for |u| ^ 

W(G„(u),Go) ^ 


This proves (i). 

Moreover, since all other points and proportions are equal, the trans¬ 
portation distance W(Gn(ui), G„(tt 2 )) is equal to the transportation dis¬ 
tance between the last p components. Since those support points keep the 
same weights and are homothetic with scale around 0 ^ 0 ) ^6 have 

exactly 

W(G4ui),G„(u2)) = W(Gi(ui),Gi(u2))n-i/(4''-2). 

This proves (ii). 

We now prove local asymptotic normality. In order to shorten notations, 
the probability under the mixing distribution Gn,(0) will be denoted by 
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and the corresponding expectation Let be an i.i.d. sam¬ 
ple with density fllLi / G'n(O))- Then, we can write the Log-likelihood 

ratio as 


Znfi{u) 


\IYl=lf{X^,n,Gn{0))J 


J^Loga + Ti,,). 

i=l 


with 

( 11 ) = 

By definition, we have 


fiX,,n,Gniu)) - fiX,,n,GniO)) 
fiXi^n^GniO)) 


f{x,Gn{u)) - f{x,Go) =7rjno ^ 7rj>(w) [/(a^, “ /(a^, 6'mo)] • 


j=mo 


Moreover, by Taylor expansion with remainder, 


f{x,9j^n{u)) - f{x,ejno) = 


2(1—1 / 1. / \ \ ^ 

hj (u) 


k=l 


nG(4d-2) 
dj,7l G) 




(2d - 1)! 


dO 


so that we get by linearity 

( 12 ) 

f{x, Gn{u)) - f{x, Go) = TT, 


mo 


' 2 d-l 

E 

lk=l 


rrik 


flkj (4(i 2) ' 


f^’'\x,9mo) +Rn{x,u) 


with moments mi,..., m 2 d -2 that do not depend on u but m 2 d-i = u and 
/lol p ( \ SX ( \ f(2d)( (^i>"’(^) “ ^)^'^ ^ Aft 

(13) Rn{x,u) = ^ 'Kj^nW j R >[x,9) - (2d - l)l - 

j=mo JdmQ V )■ 


Thus, we can write from (11), (12) and (13) 


(14) 


Yi^n{u) = TT, 


mo 


un 


+ Ri,n{u) 


i?*,n(0) 


with 


Ri,n (at) 


RniXi,n,u) ^ 

f{Xi,n,Gn{0))' /(Xi,„,Gn(0)) 
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For each fixed n and u, the {Yi^n{u), Zi^ri-,Ri,n{u)) are i.i.d. and centered 
under Gn{0). Indeed, from (11), we have 

EnfiYi^niu) = J [f{x,Gn{u)) - /(x, Gn(0))]dA(x) = 0; 

furthermore by expanding / around 6mo , we get iteratively using (p, q)- 
smoothness that for fc = 1 ,..., 2(i — 1 


f{Xi^n,Gnm 


= 0 


and in particular = 0. And dividing (12) by /(x,G„(0)) gives as a 

result KnfiRi^niu) = 0 for all u. 

Consider 

n 

(15) Zn = 'Xmon~^G ^ Zi^n- 

i=l 

By Proposition 2.3, there are positive finite constants c and G independent 
on n for n large enough such that c ^ \ Zi,nf‘ ^ C. Up to taking a 

subsequence, we may then assume cr^ for some positive a. By 

Proposition 2.3 again, we have E„^o ^ C' < oo for all n large enough. 

We may then apply Lyapunov theorem (Billingsley, 1995, Theorem 23.7) 
to prove that, with P = 

(16) Zn^M{^,T). 


Indeed, setting := Yll=i^n,Q\Zi^nf‘ ~ nu^, we see that the Lyapunov 
condition 

n 

S“^ ^E„ o \Zi,n\^ ~ n“^/^CT“^E„ o 1 0 

^ n^oo 

1=1 


is satisfied so that converges in distribution to A^(0,1) and 


(16) follows from the equality \Zi 


1/2 


.-1 


^LlZ^,n■ 


Now, to get the convergence in probability of Zn^ — uZ^ + ^P to zero. 
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it’s enough to show the following convergences for all u: 

n 

1,2 


(17) 

(18) 
(19) 


^ ^ y^,Ti('^) )* 0, 

i=l 
n 

Y,Yi,n{uf - u^T A 0, 

n 


i=l 


0 . 


2 = 1 


Indeed, we will have, since |Log(l + y) — y + y^/2| ^ C\y\^ for \y\ ^ 1/2, 


^c^|y,,„(n)/ 


2=1 


n 1 ^ 

Zn,0 ~ Yi^n{u) + — Yi,niu)‘^ 

2=1 2=1 

with probability going to one with n, so that 

2 22 -j 10 

Zn,0 -uZn + ^r = Y, YiA^) - uZn + ^[u^T - ^ Yi^ni 


U 


2=1 


2 = 1 


+ Znfl — Yi^n{u) + — Yi^ri{uy 


2=1 


2=1 


will tend to 0 in probability if (17), (18) and (19) hold. 
To prove (17), note that from (14) and (15) 


^ ^ uZj2 


'^Ri^niu) - y^^Rj^njO) 


2=1 

and the equalities 


v.2=l 


2=1 


E, 


'72,0 


^ ^ -^2,72 ('t^) 


2=1 


y 2 ^n,oRi,n{'^f = nE„,01^1,71(14)1" 


2=1 


will give the desired L^-convergence if we can prove that for each n, 
(20) nE„,o|.Ri,n(i4)P-1 0. 


To this end, we look at the expression (13) of Rn{x,u) for fixed u. We have 
\6j^n{u) — ^ for any 9 in the integrand, any j and n,. 
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We may thus write 

m . 

\Rn{x,u)\ ^ ^ -Kj{u) / 

•_ J O'. 


9mo+H{u)n 


' 4d-2 


j=mo 


ZQ—H{u)n 'ld-2 




(2d - 1)! 


dd 


,-V2 


i 


8mQ+H{u)n 43=2 
'mQ—H(u)n 43=2 


/(2^)(X,0) 


dd. 


Since we have cr-finite measures, we may use Fubini theorem. Since moreover 
9 in the integrand is between 9o and 9j^n{u) which converges to 6q, we may 
then apply Proposition 2.3. For q € |1,4], using convexity of x i-A x*? on line 
two, we may then write: 




f 1 |/(2'^)(x,0)|d6/ 


f{x,Gn{0)) 


Q Q- 

re 2 43- 


310- 


|0-0mol^ufi 43-2 


1 1^,2 0 


/(^‘^)(x, 0) 


fix,Gn{0)) 


de 


_£_ 2 _ „ 

^„re 2 43-2 C 

9 g 
2 43-2 


with C from Proposition 2.3. In particular, 

(21) n^/^En,o\Ri,n{^W re-''/^^^-^) ^ 0. 

Take g = 2 to obtain (20) ; the proof of (17) is complete. 
To prove (18), note first that from (14) and (15), 




2 2 ^ 

2 


U - 


2=1 


n 


^i,n — ^mo -Ri,n(0)) 


2=1 


2=1 


+ 


2u7r: 


2 n 


‘^y^{R,^^(u) - Ri,niO))Z, 

so that taking the L^-norm and by the Cauchy-Schwarz inequality, 


E. 


n,0 


Zi,n{ 


rel2-^i^ 


2=1 


n 


T.^1 


2=1 


reE,2,o|-Ri,n(w)|^ + reE„,o|i?q„(0)p 


+ \J reE„,o|i2q„(re)|2 reE22,o|i2i,„(0)Py/E^^oZ^^, 
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and the r.h.s. tends to 0 by (20) and the fact that ^ Moreover, 

setting 5n '■= “ <^^1; we have 



n 

2 

n 

lEn,0 

Y 

2=1 


n ^^(-^1,71 ~ 0-2^1, n) 

2=1 


^ n ^Var„^o('2^i,n) + '^n 0 


which goes to zero since —?■ 0 by definition and E„^o-^in ^ ^ some 

constant C by Proposition 2.3. We have thus, 



2=1 


and 


2=1 




n 




2=1 



0 


which prove (18). 

We turn to the proof of (19). It is easily seen from (14) that 

n n n n 

Y,\Yi,n{y)? <u n-Y‘^ Y 

2=1 2 = 1 2 = 1 2 = 1 

so that taking expectations 

n 

]En,0^ |Mi,n(ti)l^ =4u n“^/^E„,o|^l,n|^ +nE„,o|^l,n(ti)|^ +nE„,o|^i,n(0)|^. 
2=1 

But each of the three terms in the r.h.s. tends to 0: the first one because of 
IEn,o|-^i,n|^ ^ C* by Proposition 2.3, the second and the third ones because 
of (21) for q = 3. Thus Yl'i=i \Yi,n{u)\^ converges to 0 in L^. □ 


Example 2.2. Let’s take m = 2, mo = 1 and O^q = 0 so that Gq = So- 
Then Gl,n = ^ ((5_2„-l/6 + S 2 n-l/ 6 ) and G 2 ,n = + 

0 as first moment, and as second moment. The third moments are 

respectively zero for Gi^n and 12n“^/^ for G 2 ,n- With the notation (10) in the 
proof of Theorem 2-4, wo have Gi^n = Gn(0) and G 2 ,n = G'n(12). Clearly, 
one has W(Gi^ri) G 2 ,n) = for all n and as a by-product of Theorem 2-4 

(hi), {Gi^n\ and {G 2 ,n} are eontiguous. 

3. The rate 7T,-i/(4(Tn-mo)+2) jg optimal. We follow Deely and Kruse 
(1968) and Chen’s (1995) strategy of estimating G by minimizing the L°° 
distance to the empirical repartition function (28). We then need to con¬ 
trol this distance in terms of the Wasserstein metric (Theorem 3.2), under 
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appropriate identifiability conditions. To do so, we consider sequences of 
couples {Gi^m G2^n) minimizing the relevant ratios, and express F{x, G\^ri) — 
F(x, G 2 ,n) as a sum on their components F(x, Oj^n) and relevant derivatives. 
A difficulty arises: distinct components 6 j^n may converge to the same 6 j, 
leading to cancellations in the sums. Forgetting this case was the mistake 
by Chen (1995) in the proof of their Lemma 2. We deal with it by using 
a coarse-graining tree: each node corresponds to sets of components that 
converge to the same point at a given rate. We may then use Taylor expan¬ 
sions on each node and its descendants, while ensuring that we keep non-zero 
terms (Lemma 3.6). 

3.1. Strong identifiability of order k. fn what follows || • ||oo is the supre- 
mum norm with respect to x and || • || is the Euclidean norm (for instance). 
Recall that F^P\x,6) is the p-derivative of F{x,9) with respect to 6 . 


Definition 3.1. A family {F{x,9),0 G 0} of distribution functions is 
fe-strongly identifiable if for any finite set of say m distinct 9j, then the 
equality 


k m 


p=0 j=l 


OO 


= 0 


implies apj = 0 for all p and j. 


Remark 3.1. For a k-strongly identifiable family and fixed 9i, we may 
consider 


k m 

yy i^j) 

p=o i=i 

Since the inner norm is a eontinuous funetion of a and the sphere is compact, 
this infimum is attained, and hence not zero: for some c{9i ,..., 9m) > 0, we 
have: 


inf 

|q;|| = 1 


( 22 ) 




p=0 j=l 


^ c(0i , . . . , 9m) II O I 


3.2. Main result and corollaries. 
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Theorem 3.2. Assume that {F(x,9),9 € 0} is 2m-strongly identifiable 
and that F{x,9) is 2m-differentiable with respect to 9 for all x, with 

(23) (x, 0i) - (x, 92) = o(0i - 92) 


uniformly in x. Then, for any Gq G Gmo: there are e > 0 and (5 > 0 such 


that 

(24) 


inf 

Gl,G2GS^m 

Gi^G2 

W(Gi,Go)vW(G2,Go)s:£ 


||F(x,Gi)-F(x,G2)IL 


> (5. 


Corollary 3.3. Under the conditions of Theorem 3.2, there exists 5 > 0 
such that 


(25) 


||F(x,Gi)-F(x,G2)IL ^ . 

Gi,G2€e^^ W(Gi,G2)2—1 ■ 

Gi^Ga 


Proof of Corollary 3.3. Consider a sequence (Gi^„, G2 , n ) in G^m with 
Gi,n G 2 ,n for each n and such that 
(26) 

||F(x,Gi,„)-F(x,G 2 ,n)IL _ . f ||F(x,Gi)-F(x,G 2 )IL 

W(Gi,„,G2,n)2™-l ™ W(Gi,G2)2™-1 

Gi7^G2 

We can assume that (Gi^„, G 2 ,n) converges to some limit (Gi^ooj ^ 2 , 00 ) in the 
compact set G ^ m - Distinguish two cases. 

Suppose first that Gi^oo G 2 ,oo- Set w := W(Gi^oo, ^ 2 , 00 ) > 0 and let xq 
such that zq := |T(xo,Gi^oo) — P(aJO) G*2,oo)| > 0. Then, for all n 

||F(x,Gi,„)-F(x,G 2 ,n)IL ^ |T(xo,Gi,„)-T(xo,G 2 ,n)| 

^ ^ W(Gi,„,G2,n)2—1 " W(Gi,„,G2,n)2™-l ' 

The numerator of the r.h.s. of (27) tends to zq since |T(xo, Gi^n)—F{xo, Gj^oo)! 
is bounded by Ko\N{Gi^n,Gi^oo) with Kq = maxgge (xq, 0)1 {i = 1,2). 
And by assumption, W(Gi^ri,) G 2 ,n) tends to w. As a consequence, (27) and 
(26) give (25) by choosing <5 := zojw‘^^~^. 

Suppose now that Gi^oo = G 2 ,oo- Set Gq := Gi^oo which is in Gmo with 
some mo at most m. Consider e > 0 and <5 > 0 as defined in (24) ; for n large 
enough, say n ^ no, \N{Gi^n,Go) {i = 1,2) is less than e so that by (24) 

. ||F(x,Gi,n)-F(x,G 2 .n)IL . , 

n^no W(Gl,n, G2,n)^™'“^"*“'''^ 
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Moreover, for n large enough, say n ^ ni, G' 2 ,n) is small so that 

\N{Gi^n, is more than W(Gi^„, and thus for all n ^ 

no + ni, 

||F(x,Gi,0-F(x,G2.n)IL .. \\Fix,G,,n)-F{x,G2,n)\L , 

W(Gi,„,G2,n)2"^-^ ^n^™+ni W(Gi,„, G2,n)2™-2™0+l 

□ 

Corollary 3.4. Let e > 0. Under the assumptions of Theorem 3.2, let 
Go £ Gmo o,nd Fji be the empirical distribution of n i. i. d. random variables 
with distribution F{x,Gi). Let Gn be a near optimal estimator of Gi in the 
following sense: 

(28) ||F(x,Gn) - F„(x)||oo ^ ^inf ||F(x,G) - F„(x)||oo +-• 

GGy^rn ^ 

Then, 

W{Gn,Gi) ^ 

in probability under Gi, uniformly for Gi G Q^m such that W(Gi, Gq) < £. 

Proof of Corollary 3.4. We simply follow Chen (1995, Theorem 2). 
By the triangle inequality and (28) (choose G = Gi), we have 

||F(x,G„) - F(x,Gi)||oo ^ 2||F(x,Gi) - F„(x)||oo + -• 

n 

Moreover by the DKW inequality (Massart, 1990), we have 

||F(x,Gi) - Fnix)\\oo =4 

\/n 

and thus 

(29) ||F(x,G„)-F(x,Gi)||oo ^ ^ 

Vn 

in probability under Gi, uniformly in Gi. 

We also have \N{Gn, Gi) ^ 0. Otherwise, since Gn is in the compact space 
G^rn, there would be a subsequence Gn^. which converges to some G 2 ^ Gi 
and thus we would have for all x: 

\F{x, Gn,) - F{x, G 2 )| < max |F(i)(x, 0)| W(G„,, G 2 ) ^ 0. 
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This, together with (29), would imply \F{x,Gi) — F{x,G 2 )\ = 0 for all x, 
which contradicts identihability. 

Consequently, if W(Gi,Go) < e, we have W(G„,Go) < 2e for n large 
enough, and by Theorem 3.2 and (29), 

W(G„,Gi)2—2-0+1 ^ ||f(x,G„) - T(x,Gi)||oo ^ ^ 

Vn 

in probability under Gi, uniformly in Gi G Q^m such that W(Gi,Go) < 

e. " □ 


3.3. Proof of the main Theorem 3.2. In all this section, keep in mind 
the hypothesis of Theorem 3.2: the family {F{x,9),9 G 0} is 2m-strongly 
identihable and F{x, 9) is 2m-differentiable with respect to 9 for all x, with 

(x, 9i) - (x, 92) = o{9i - 92) 


uniformly in x. Note first that proving (24) amounts to proving 

||F(x,Gi)-F(x,G2)|| 


lim t 

n—>-CO 


inf 
Gl 

W(Gi,Go)VW(G 2 ,Go)sSl/n 


W(Gi,G2)2--2™o+1 


> 5. 


From now on, we consider two sequences (Gi^n)) {G 2 ,n) in such that for 
each n ^ 1: 

• G\^n 7 ^ ^ 2 , 71 ) 

. W(Gi,„,Go)^i (z = l,2). 


. ||F(x,Gi)-F(x,G 2)|L > ||F(x,Gi,„)-F(x,G2,n)|L 1 

GiGS^^ W(Gi,G2)2--2«^0+1 ^ W(Gi,„,G2,n)2--2-0+l n 

Gi 7^G2 
W{Gi,Go)s£i 


Consequently, it’s enough to prove that 


(30) 


.. . . ||F(x,Gi,.)-F(x,G2,.)||^ 

n^<x> W(Gi,„,G2,n)2—2-0+1 


> <5. 


Since (Gi^„), (G 2 ,n) are two sequences in Q^m and m is finite, we may and 
do assume that (Gi^n) F Qrm for some ^ m and z = 1,2. We can then 
write for each n 


mi mi+m2 

G*l,n = and G2,n = '^‘i,j,n^S 2 ,j,n 

j=l j=mi+l 
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and define for each n a signed measure Gn of total mass zero: 


with 




mi+m2 

Gl,n ^2,71 — 



J=1 

j,n, 01,j,n) 

for j G [l,mi| 

( ^2,j,n5 ^2,j,n) 

for j G [mi + l,m 2 | 

s. Set for short 



Jo = [l,mi + m2]. 

Since Jq is finite, up to selecting a subsequence of Gn, we may find a fi¬ 
nite number of scaling sequences eo,n) £i,n, ■ ■ ■, £smax,ro together with integers 
s(j, k) and cr{J) in |0, Smax] for any j, k G Jq and J C Jo, such that 


0 — £'0,n ^ £l,n < ■ ■ ■ < £s^ 
(31) \^j,n ^fc,n| ^ ^s{j,k),n, 


= 1 , 


with £s,n = o{£s+i^n). 


(32) 




J,n 


T{J),n- 


We also define the s-diameter of J as 


s(J) = sup 5{j,k). 
j,k&J 


3.3.2. Defining a tree for the key lemmas. Note that the application s(-, •) 
defined by (31) is an ultrametric on Jo (but does not separate points). Thus 
we may define a tree T whose vertices are indexed by the distinct ultrametric 
closed balls J = Bfij, s) when j ranges over Jo and s over |0, Smax]. 

Indeed, if I and J are two such balls, and / n J 7 ^ 0, then either / C J or 
J Cl. 

So that, defining the set of descendants and the set of children of J by 
Desc( J) = {/ G r : I C J}; 

Child( J) = {I G Desc( J) ■. I c H C J,H gT ^ H = I}, 
we get a tree T with root Jo, and where the parent oi J Jo is given by 


p{J) = K 


J G Child(K). 
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Lemma 3.5. With the above notations, given the tree T, 

(33) W (G*! , 1 , G2,n) ^ max ^a(J),n^s(p{J)),n- 

J €D©sc^t/o) 

Proof. See Appendix A.3. 


□ 


Set now 

F{x, Gn) ■■= F{x, Gi,„) - F{x, G2,n) 

and for J C Jo, 

F(x,J) .— ^ j,nF (x, 9j^n) ■ 

3&J 

Note that F{x,Gn) = F{x,Jo)- We now use Taylor expansions along the 
tree T to express the order of F{x, Gn) in terms of the scaling functions Eg- 


Lemma 3.6. Let J be a vertex of the tree F and set dj = card(J). Pick 
9j := Oj^n in the set {9j^n ■ j £ J}- The subscript n is skipped from the 
following notations. There is a vector ijj = (??fc,j)o^A:sg 2 m o.nd a remainder 
R{x, J) such that 


(34) 


2m 


F{x,J) = Y,9kAij)F^^\x,9j) + R{x,J), 

k =0 


where: 

(^) mj = E TTj and \r]k,j\ ^ 1 for all k ^ 2 m; 

3&J 

(a) Taking subsequences if needed, there is a coefficient of maximal 
order among the dj first ones. That is, there is an integer k{J) < dj 
such that 

\\r]j\\ := max \7]k,j\ ^ |%(j),j|; 
k^ 2 m 

(in) The norm ||??j|| is bounded from below (up to a constant) by a quantity 
linked to the Wasserstein distance: 


>max e^(j), max e^(j) 

\ 7GDesc(J) 


V es(J) J 


dj — T 


(iv) The remainder term is negligible. Uniformly in x: 

R{x,J) = o(^||r/j||e2^”})) . 


Proof. See Appendix A.4. 


□ 
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3.3.3. Concluding the proof. Let us now consider the root Jq of the tree 
T. Distinguish two cases: 

Case 1. Assume that s(Jo) < Smax- We have es(j^) = o(l) and may apply 
directly Lemma 34 to Jq'. 


2m 


F{x,Gn) = F{x,Jo) = + R{x,Jo), 


k=0 


where at least one satisfies 


\Vk,Jo\ > max e^(7) 

/GDesc(Jo) 


s(p(L) Y 




^ max e^( 7 ) 

ISDesci Jo) 


^s(Jo) / 

so that one of the coefficients of the derivatives satisfies 


£s(Jo) / 


2m— 


-.2m—1 


Thus, taking f = 1 in the lower bound (22), and since R{x,Jo) is of 
smaller order, we get 


2m—1 


l|-^(3;,Gn)|loo > , max £aiI)£,Z(I)) > W(Gi,n,G 2 ,n) 

where the last inequality comes from Lemma 3.5. 

Case 2. Assume that s(Jo) = Smax- We split Gn over the first-generation 
children: 

F{x,Gn) = F{x,Jo) = 

/GChild{Jo) 

■ 2m 


E 


E’'w4(;)r'‘'(i.9;) + R(i',/) 


/eChild{Jo) Lfe=0 

Moreover the 0/ for I G Child( Jq) are e-separated for some e > 0 (see 
(45)), so that the lower bound (22) can be applied and yields, since the 
i?(x,/)’s are negligible: 

\\F{x,Gn)\\^ > max max 17?^/e^/^J ^ max ma.x\r]kjetj)\. 

°° /eChild(Jo) fc^2m ’ /GChild(Jo) fc<J/ ^ ’ 

On the one hand, we have maxfc<^^ l^fcd^s(/)l ^ Ivojl and since |r)o,/| = 
I EjG/^il ^ deduce 

\\Fix,Gn)\\^ > max ^( 7 ). 

1 € 011110.(^^/0 J 
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On the other hand, we have niaxfc<rf^ ^ 

so that from Lemma 3.6 (ii) and (hi) for I, we deduce further 


\\F{x,G., 


nlWoo ^ 




max 

/eChiid(Jo) 
max 


max 


_d/ —1 


max max 
/eChild(Jo) HeDesc(/) ^ 


After recalling that es(j^) = 1 and setting d* = max/gchild(Jo) 'w® 
may combine these two lower bounds and get 


\\F{FGr.)l 






_d/ —1 


max max } (uw 

/GChild(Jo) UeDesc(/)U{/} ^ 


-d*-l 




> W(Gi,„,G2,n) 


d* —1 


where the last inequality comes from Lemma 3.5. Since and G 2 ,n 
converge to Gq E Qmai the root Jq (of cardinality mi +m 2 ) has at least 
mo children with at least two elements. Thus, the cardinality d* of the 
biggest child is bounded by mi + m 2 — 2(mo — 1). Thus, 

\\F{x,Gn)\\^ > W(Gl,„,G2,n)™l+’"^-2”^« + l ^ W{Gl,n,G 2 ,n?^-^^°+\ 

Finally, if mo is more than one, we are in the second case (where s(Jo) = 
■Smax) and if mo is one, the two cases can occur. But whatever the case, we 
always have 

||F(x,G„)|loo ^ W(Gi,„,G2,n)“™°+^ 

so that (30) is proved. 


4. A class of fc-strongly identifiable families. We expect the strong 
identifiability to be rather generic, and hence the above theory often mean¬ 
ingful. In particular, Chen (1995, Theorem 3) has proved that location and 
scale families with smooth densities are 2-strongly identifiable. The theorem 
and the proof straightforwardly generalise to our case. We merely state the 
result. 


Theorem 4.1. Let k ^ 1. Let f be a probability density with respect to 
to the Lebesgue measure. Assume that f is k — 1 times differentiable with 

lim (x) = 0 for p £ |0, A; — 1]. 

ai^±oo 

Set F{x,6) = — 0)dy. Then the family {F{x,9),6 E 0} is k- 

strongly identifiable. If Q G (0, 00 ), the result stays true with F{x,6) = 
U-oofie)dy- 
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APPENDIX A: AUXILIARY PROOFS 
A.l. Proof of Equation (9). The map 

/ d d d d 


(j) : (vTi, ( X] X] ’ X] ^ 

has the following Jacobian : 


1 1 


= (- 1 )^ 2 ^ Tri-'-TTrf {ej-Ok)^. 

l^j<k^d 


To prove this, note that 


J(0) = 


1 

9i 


^2d-l 


1 

Od 

32 d-l 


0 

TTl 

277101 


(2d-l)7ri0f 


2d-2 


= 7ri---7rrfAfl 


with 


Arf = 


oi 


2d-l 


1 

Od 

ol 


0 

1 

201 


g2d-l 

^d 


1 

Ol 

01 


02d—2 

noi) 


{2d-i)ei 

1 0 

0 d 1 

93 291 


2d-2 


g2d-2 

^d 


d f-^ {2d - 2)9f 

P'i9i) 


2d -3 


0 

VTd 

27rd0d 


(2d - l)7rrf0j' 


2d -2 


0 

1 

20d 


2d-2 


(2d-l)0j' 

0 

1 

20d 

(2(i - 2)02;^-3 

P\9d) 
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where P can be any (normalized) polynomial of degree 2d — 1 and P' its 
derivative. Choosing P{0) = {6 — 6^) 



1 

1 

0 

0 


9i •• 

• 9d 

1 

1 

P'iOd) 

0? •• 

• 

26»i 

29d-i 


02d—2 

n2d—2 
• 0d 

{2d - 2)91'^-^ ■ ■ 

■ {2d - 2)91^-^ 


1 

1 

0 

0 


9i ■■ 

9d 

1 

1 

P'iOd) 

Of ■ 

■ 01 

29i 

29d-i 


Q2d—3 

n2d—3 
’ * ^d-1 

{2d - • 

•• (2d-3)0“-^ 


Q{0i) ■ 

•• Q{0d) 

Q'{0i) 

Q'{0d-i) 


where Q is any polynomial of degree 2d — 2. With Q{6) 
we obtain 


n («-«>)". 


Ad = {-iy-^P'i9d)Q{9d)Ad-i = 


d-l 


i-lf-^ll{9d-9,)^Ad-i. 

i=i 


By iteration, we get 


Ad 


d-l d-2 

(-i)"-' - op(-p~" n*''-'-! - 

j=i i=i 

^_^^d_l+d_2+...+l _ 0.)4 a, 

k=2j=l 


= (- 1 )"^ n 

l^j<k^d 

since Ai = 1. The proof is complete. 


A.2. Auxiliary matrix tool. 

Lemma A.l. Suppose j, di and d are positive integers such that di 
d. Consider numbers ,9j all distinct. Write 


I = {{i,i) £ N : I ^ i ^ j,l ^ i ^ di} . 
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Define for each {i,i) G X a d-dimensional column vector as follows: 

Qk—i 

^ *_ £)! ^ ^ d, 

and stack these vectors in a d x d matrix 


ai,e[k] 


(35) A.(^9i,... jOj) — • 

Then, the rank of A{9\,... ,9 j) is d. 


Proof. Set for short A = ^(^i, . .. ,9j). Let A = {Xi,£)[i/)£x be a vector 
such that A A = 0. Proving the lemma is equivalent to proving that A = 0. 
Note that 

j X 

(36) (AA)fc = Xi^iOi^ilk] = ^ Xi,£,, 

(i/)GX i=l i=i ^ b 

and for any {d — l)-degree polynomial P{x) = ; we have 

d-l 

(37) (co,...,Cd_i)AA = ^Cfc(AA)fc+i= ^ Ai,,p('-b(0,). 

fc =0 

Hence, if AA = 0, then (37) is zero. In particular, the {d — l)-degree poly¬ 
nomials 


j 

Pk{x) = {x- 9kY>-~^ JJ(x - 9iY\ l^k^j, 

i=l 

i^k 


yield 


\,£Pt"\9,) = Xk,d,{dk 


j 


l)lll{9k - 9i)^^ = 0, 


2 = 1 
i^k 


so that Xk,dk = 0. More generally, 


j 

Pk,q{x) = {x - 6'^)'^''“'*' JJ(x - 9iY^ (1 ^ O' ^ dk) 

2=1 

i^k 


has the property that, for (i,£) G X, pjf^ ^\di) is zero if i k or i ^ dk — q. 
On the other hand this term is never zero if i = A: and i = dk — q + 1. So 
that recurrence on q yields Xk,dk-q+i = 0) hence A = 0. □ 
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Definition A.2. 

if 


Let e > 0. A vector in is said e-separated 

Vi / if \9i — 9i'\ ^ s. 


Corollary A. 3. Let e > 0. There exist constants c > 0 and C > 0 such 
that for any vector A and any e-separated vector {9i)i<^i<^j, 

c\\A\\^\\Ai9u...,9,)A\\^C\\A\\, 

where A(0i, ... ,9j) is as in (35). 

Proof. Note first that the set Pg of all e-separated family is 

compact in 0-^. Moreover the norm ||A(0i,..., 0j)A|| is a continuous function 
of {{9i,... ,9j), A) on the compact space Pg x ^(0,1) where ^(0,1) is the 
d-dimensional unit sphere. Its infimum is attained on Pg x ^(0,1), say at 
. Now, by Lemma A.l, c := ||A(0J,... ,9*)A*\\ is positive so 
that c II A|| ^ II A(0i,..., 0j)A|| for every A and every in P^ . 

Conversely, C is easily bounded from above by the sum of the norms of 
the matrix entries, and all those are bounded since 0 is compact. □ 

A. 3. Proof of Lemma 3.5. We shall estimate W(Gi^„,G 2 ,n) with the 
comparison scale and the tree T. Set for any function / on 0 and any J C Jo 

( 38 ) fiJ) =J2^Lnf{^j,n) ■ 

j&J 

In what follows the subscript n is fixed and thus skipped in the 0j’s, tt^’s and 
e^’s. Recall that the collection of distinct ultrametric balls J = {k '. 5{k,j) ^ 
s} for j varying in Jq and s in {0,... , Smax} form a tree T. For each distinct 
J, we picked an arbitrary j ^ J and set 9j = 9j. Set also for short 

3&J 

Let / be 1-Lipschitz on 0. We first prove by recurrence that for any vertex 
J of the tree, 

(39) f{J) 4 Tr{J)f{9j) + max e^(j)es(p( 7 )). 
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If J has s-diameter zero, then /(J) = 7r(J)/(0j) and (39) is satished. Next, 
if J has children I that satisfy (39), we compute 


f{J) = 




E 

/eChiid(j) 


E 

/echiid(j) 




max 

HGDesc(/) 


£a(H)£sip(H)} 


— ^ ^ /iGDesc(/) 


/GChild{J) 




Since |vr(/)| is of order ecr( 7 ) and \6i — 0j\ is of order £s{J) ^e see that (39) 
holds for J and in particular for Jq for which we have 'k{Jo) = 0. 

To prove the reverse inequality, let J ^ Jo such that £(j{J)^s{p(J)) is max¬ 
imal. Set e(J) = miuj^j \6j — 9j\ which is bigger than the s-diameter of 
J and consider a 1-Lipschitz function / on 0 such that /(J) = 0 and 
f{Jo\J) = |7r(J)|[e(J) — s(J)], for instance 

f{9) = -sgn(7r( J)) X min{e( J) - s( J), [\9 - 9j\ - s( J)]+} 

which satishes f{6j) = —sgn(7r(J))[e(J) — s(J)]lj^j. We get 

f{Jo) = fiJo \ J) + /(J) = k(j)|[e(j) - s(J)] 

and since |vr(J)| is of order £o{j) and e(J) is of order £s(p(j)) at least, we 
deduce 

It remains to note that (by the Kantorovich-Rubinstein Theorem) 

W(Gi,„,G2,n)= sup [ f{e)dGni9)= SUp /(Jo). 

ll/llLip^We ll/llLip^i 

A.4. Proof of Lemma 3.6. Recall that we set tt{J) = Yljsj^j- We 
use definition (38) for f{9) = F{x,9). If J satisfies s(J) = 0, then all the 
9j for j ^ J are equal and F(x,J) = tt{J)F{x,9j). In this case, the choice 
Vkj = vr(J)l{fc=o} and R{x,J) = 0 work. 

Assume now that lemma 3.6 holds for any vertex I with parent J in the 
tree T. We write a Taylor expansion to pass the estimates of I to the parent 
J. By assumption, 

2m 

F{x,i) = 

1=0 


(40) 
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Assuming without loss of generality that 9j ^ Oj, we apply Taylor’s formula 
with remainder to F^^\x,6i) at 6j and obtain 

(a a \k—£ 1 —£ 


So that 


/a a \k—£ 


k=l 


rSi 

Jdj 


{k-£)l 

{Oi-O 


2m-l-£ r 


J (2m-!-£)! L 
{dj - 




di 


0{ sup |F(2 ™)(x,O-T(2™)(x,0j)| 

where we used assumption (23) in the last equality. Setting now 

0i - Oj 


(41) 

we obtain 


at,I = rjij 


MJ) 


hi = 


^b(J) 


2m 


F^^\x,9i) = Y,^ 


^k-e 


k=i 






and substituting in (40), we get 

2m k 


k-£ 


F{x, I) = Y1 + ^ 5 ) S 


/c=0 


£=0 


Adding up F{x,I) over the children / of J gives (34) i.e. 

2m 

F{x,J) = ^%,je3"(^)F('=)(x,0j) + i?(x, J), 

k=0 


+ R{x,I). 
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with 

/eChild{J) €=0 ^ 


and 


(43) R{x,J)= ^s(J) |a£^/| o(l) + i?(x,/) 

/eChiid(j) L 


We first prove (i) for J. From (42) for k = 0 and (41) and recurrence 
hypothesis on I, we have 


m,j= Y1 “ 0 , 7 = m,i= Y1 = 

7eChild(J) 7eChild(J) 7GChild(J) jG/ j&J 

Moreover, since \hi\ ^ 1 for each child I of J, Equation (42) yields 


|%,j| ^ max \aij\. 

i&cmd{j) 

Furthermore, from (41) we have \cxij\ ^ |%,/| since £s(^i) ^ ^s{J) - By assump¬ 
tion on I, we have \r](,j\ ^ 1 so that \aij\ and thus \r]k,j\ are 0(1) and (i) is 
established. 

IFe turn to the proof of (ii) . The first step is to show that 


(44) max|?7fc,j|>; max \aej\. 

k<dj i<dj 

/GChild(J) 

To this end, note that (41) gives, for any two distinct children I and I' of J: 

(45) \hi - hii\ = 

The finite set {/i7}/6Chiid(j) is thus e-separated for some e > 0. Hence, if we 
set A = (a7,7)o<g^^d^_i> we get by Corollary A.3 


max 

k<dj 


E 


hp‘ 


dj — 1 

/GChild(J) 7=0 ^ ' 


max ap r 
e<di 

7GChild(J) 


Now, to obtain (44), we see from (42) that it’s enough to show 


(46) 


k ,k-e. 

I G Child( J), k di ^ ^ 


= o 


( max la/? j 

\£<di 
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Since \hi\ ^ 1, we have 


(47) 


E 

i=di 


h 


k-l 


aij 


(k-. 


^ max \at r\- 


By assumption on I, we also have \\r]j\\ x max£<(^^ \ Vi,l\j so that 


-• max I aij \ = 

^s{J) ^<^1 ’ ^s(J) 


I I I ( 

'7 = -• max \m j\ - 

eB(J) i<di ’ Ve.(J) 




where the last inequality comes from (41). Thus 
(48) max lap i\ = o { - 

di^e^ik 


^s(J) ) 
^ max \apj\, 

dj^iikk 


di 


, max at r 

\i<di 


so that (47) and (48) yield (46) and (44) is proved. 
The second step is to prove 


(49) 


: max |??fc,j|. 

k<dj 


The non-trivial part is ||7yj|| ^ \rjkj\ - By the definition (42) of 

(48) and (44), we have 

max|r7t.7| ^ max maxla/rl 

k^dj ’ k^dj ^ 7s:fc ’ 

/GChild(J) 

^ > maxlci^/l ^ max \ap A ^maxlr/fcjl. 

£<di ’ £<di ’ k<dj ’ 

n M 1 / TX 


l<di - 

/eChild(J) /GChild(J) 

The proof of (ii) is complete. 

ITe turn to the proof of (hi). From (49), (44) and (41), we get 

-<i) 

^s(J) 

Ml) 
MJ) 


m.ax\r]k,j\> max \aij\ > max \r]ij\ 

k<dj i<dj i<dj 

■ .. /GChild(J) 


t.<dj 
/GChild(J) 


(50) 


max ||r?/| 
/GChild(J) 


dj — 1 


Let H 


be a descendant of I, a child of J. Assumption (hi) on I then yields 

di — l / _ \ dj — l 

Cc 




MfX 

h{j) 


dj—l 






£0(7) ) 


Ml) 

T(J) 


■BipjH)) \ 

MJ) / 


dj — 1 
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Moreover (i) implies 


||??j|| ^ |%,j| = k(J)| X e^(j). 

Similarly, from (44) and (i) for I, 

II^jII > |ao,7| = \vo,i\ = k(/)| >; £^(7), 

so that (iii) is established for J. 

We finally prove (iv). From (43), (48), assumption (iv) for I and (44), we 
have 


i?(x, J) ^ 


max 

/GChild(J) 


max \aij\ o(l) + R{x, I) 


i<2m 


4 


max 

/GChild(J) 


max \aij\ o(l) + o (||??/||ej7)) 


and in addition, for each child I of J, from (50), 




and we are done. 
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